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Abstract 

This paper studies the recovery of a superposition of point sources from noisy bandlimited data. 
In the fewest possible words, we only have information about the spectrum of an object in the low- 
frequency band [—fio, fio] and seek to obtain a higher resolution estimate by extrapolating the spectrum 
up to a frequency / hi > f\ a . We show that as long as the sources are separated by 2//i OJ solving a 
simple convex program produces a stable estimate in the sense that the approximation error between the 
higher-resolution reconstruction and the truth is proportional to the noise level times the square of the 
super-resolution factor (SRF) fhi/fio- 

Keywords. Deconvolution, stable signal recovery, sparsity, line spectra estimation, basis mismatch, super- 
resolution factor. 



1 Introduction 

It is often of great interest to study the fine details of a signal at a scale beyond the resolution provided by 
the available measurements. In a general sense, super-resolution techniques seek to recover high-resolution 
information from coarse scale measurements. There is a gigantic literature on this subject as researchers, 
for instance, always try to find ways of breaking the diffraction limit — a fundamental limit on the possi- 
ble resolution — imposed by most imaging systems. Examples of applications include conventional optical 



imaging [18], astronomy [25], medical imaging 10 , and microscopy [20] . In electronic imaging, photon shot 



noise limits the pixel size, making super-resolution techniques necessary to recover sub-pixel details 21 



Among other fields demanding and developing super-resolution techniques, one could cite spectroscopy 



radar 22 , non-optical medical imaging 15 and geophysics |16| 



23 



11 



In many of these applications, the signal we wish to super-resolve is a superposition of point sources; de- 



pending upon the situation, these may be celestial bodies in astronomy 19 , molecules in fluorescence 
microscopy 19 , or line spectra in speech analysis 14 . In the companion article [4], the authors studied 



the problem of deconvolving point sources from low-pass measurements. Whereas [4] focused mostly on the 
noiseless setting, in which one has perfect low-frequency information, this paper extends previous results by 
considering the noisy setting in which data are contaminated with noise, a situation which is unavoidable 
in practical situations. In a nutshell, p] proves that with noiseless data, one can recover a superposition 
of point sources exactly, namely, with arbitrary high accuracy, by solving a simple convex program. This 
phenomenon holds as long as the spacing between the sources is on the order of the resolution limit. With 
noisy data now, it is of course no longer possible to achieve infinite precision. In fact, suppose the noise level 
and sensing resolution are fixed. Then one expects that it will become increasingly harder to recover the 
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Figure 1: Sketch of the super-resolution factor (SRF). A signal (left) is measured at a low 
resolution by a convolution with a kernel (top middle) of width Ai (top right). Super- resolution 
aims at approximating the outcome of a convolution with a much narrower kernel (bottom 
middle) of width Ahi- Hence, the goal is to recover the bottom right curve. 

fine details of the signal as the scale of these features become finer. The goal of this paper is to make this 
vague statement mathematically precise; we shall characterize the estimation error as a function of the noise 
level and of the resolution we seek to achieve. As we shall see next, increasing resolution essentially means 
filling-in parts of the missing spectrum. 

1.1 The super-resolution problem 

To formalize matters, we have observations about an object x of the form 

y(t) = (Q lo x)(t) + z(t), (1.1) 

where t is a continuous parameter (time, space, and so on) belonging to the d-dimensional cube [0, l] d . 
Above, z is a noise term which can either be stochastic or deterministic, and Q\ Q is a bandlimiting operator 
with a frequency cut-off equal to f\ Q = 1 / Aj Q . Here, Ai Q is a positive parameter representing the finest scale at 
which x is observed. To make this more precise, we take Q\ Q to be a low-pass filter of width Ai as illustrated 
at the top of Figure [TJ that is, 

{Q\ x){t) = (K lo *x)(t) 
such that in the frequency domain the convolution equation becomes 

(Q^)(f) = K^(f)x(f), fez d , 

where x(f) ~ j e^ l27T ^'^ x(dt) is the usual Fourier transform. The low-pass kernel K\ Q (f) vanishes outside 
of the cell [-/i ,/ l0 ] rf . 

Our goal is to resolve the signal x at a finer scale Ahi <C Ai . In other words, we would like to obtain a 
high-resolution estimate x cs t such that Qhi ^cst ~ Qhi x, where Qhi is a bandlimiting operator with cut-off 
frequency /hi = 1 /A hi > f\ Q . This is illustrated at the bottom of Figure [I] which shows the convolution 
between Ahi and x. A different way to pose the problem is as follows: we have noisy data about the 
spectrum of an object of interest in the low-pass band [— /i ,/io]> and would like to estimate the spectrum 
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in the possibly much wider band [—/hi, /hi]- Wc introduce the super-resolution factor (SRF) as: 



SRF = ^ = ^; (1.2) 
ho Ahi 

in words, we wish to double the resolution if the SRF is equal to 2, to quadruple it if the SRF equals four, and 
so on. Given the notorious ill-posedness of spectral extrapolation, a natural question is how small the error 
-Khi(^cst — x) between the estimated and the true super- resolved signal at scale Ahi can be? In particular, 
how does it scale with both the noise level and the SRF? This paper addresses this important question. 



1.2 Models and methods 

As mentioned earlier, we are interested in superpositions of point sources modeled as 

X = Yl a 3 5 t j > 
3 

where {tj} are points from the interval [0, 1], S T is a Dirac measure located at r, and the amplitudes dj may 
be complex valued. Although we focus on the one-dimensional case, our methods extend in a straightforward 
manner to the multidimensional case, as we shall make precise later on. We assume the model in which 
t € [0, 1], which from now on we identify with the unit circle T, and z(t) is a bandlimited error term obeying 

\\z\\ Ll = [ \z(t)\dt<5. (1.3) 
Jt 

The measurement error z is otherwise arbitrary and can be adversarial. For concreteness, we set K\ a to be 
the periodic Dirichlet kernel 

K lo (t)= Y e^= Sin(7r(2 { lo | 1)f) . (1.4) 



By definition, for each / e Z, this kernel obeys K\ a (f) = 1 if |/| < f\ whereas K\ Q {f) = if |/| > f\ a . 



We emphasize, however, that our results hold for other low-pass filters. Indeed, our model (1.1) can be 
equivalently written in the frequency domain as y(f) = x(f) + z(f), |/| < f\ . Hence, if the measurements 
are of the form y = G\ Q * x + z for some other low-pass kernel G\ , then the model can be written as 
y(f) = x(f) + z(f)/G\ (f), so that we have a very similar formulation. We omit the straightforward details. 

To perform recovery, we propose solving 

min ||a;|| TV subject to | \Qi x — y\ \ L < 6. (1-5) 

Above, ||x|| TV is the total-variation norm of a measure (see Chapter 6 of 27 or Appendix A in [4]), which 
can be interpreted as the generalization of the l\ norm to the real line. (If x is a probability measure, then 
I Ml tv = This is not to be confused with the total variation of a function, a popular regularizer in signal 
processing and computer vision. Lastly, it is important to observe that the recovery algorithm is completely 
agnostic to the target resolution Ahi, so our results hold simultaneously for any value of Ahi > Ai Q . 

1.3 Main result 

Our objective is to approximate the signal up until a certain resolution determined by the width of the 
smoothing kernel Ahi > A[ used to compute the error. To fix ideas, we set 

**(*) = r^rr E (MiH^^^f^) 2 (i.6) 

/hi + 1 , /hi + 1 V sm(7rt) J 

K — — J hi 
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(a) 



(b) 



Figure 2: The Fejer kernel ( 1.6 ) (a) with half width about Ahi, and its Fourier series coefficients 
(b) . The kernel is bandlimited since the Fourier coefficients vanish beyond the cut-off frequency 
/in- 



to be the Fejer kernel with cut-off frequency /hi = Figure [2] shows this kernel together with its 

spectrum. 

As explained in Section 3.2 of [4], no matter what method is used to achieve super-resolution, it is necessary 
to introduce a condition about the support of the signal, which prevents the sources to be close to each other. 
Otherwise, the problem is easily shown to be hopelessly ill-posed by leveraging Slepian's work on prolate 



spheroidal sequences 31 . In this paper, we use the notion of minimum separation. 



Definition 1.1 (Minimum separation) For a family of points TcT, the minimum separation is defined 
as the closest distance between any two elements from T , 

ACT) — inf \t-t'\. 

(t,t>)eT:t^f 



Our model (1.3) asserts that we can achieve a low-resolution error obeying 

H-Kio * (xest ~x)\\ Li <S, 

but that we cannot do better as well. The main question is: how does this degrade when we substitute the 
low-resolution with the high-resolution kernel? 

Theorem 1.2 Assume that the support T of x obeys the separation condition 

ACT) > 2A lo . (1.7) 
Then under the noise model (1.3), any solution x cst to problem (1.5 ^ obeys 



||fr hi *(a; aB t-aO|| Ll < C SRF 2 5, 
where Co is a positive numerical constant. 

Thus, minimizing the total-variation norm subject to data constraints yields a stable approximation of any 
superposition of Dirac measures obeying the minimum-separation condition. When z = 0, setting 5 — and 
letting SRF— > oo, this recovers the result in [4| which shows that x cs t — x, i.e. we achieve infinite precision. 
What is interesting here is the quadratic dependence of the estimation error in the super-resolution factor. 



lr To be precise, the theorem holds for any feasible point x obeying ||x|| TV < IMItV' * n ' s se * ' s no * em pty since it contains x. 
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It goes without saying that Theorem 1.2 can also be specialized to a stochastic noise model. Suppose we 
observe noisy samples of the spectrum 



T)(k) 



-i2ixkt 



x(dt) + e k , k = -/io,-/io + 1, ■ • ■ ,/io, 



(1.8) 



where e k is an iid sequence of complex- valued 7V(0, a 2 ) variables (this means that the real and imaginary 
parts are independent A/"(0, a 2 ) variables). This is equivalent to our model ( |1.1| ) with 



A-/ 



Z (t)= y ***** 

k=-ha 

We have \\z\\l! < II z I|l 2 ancl ||z||l 2 = ll e ll^2 by Parseval. Further, ||e|| 2 follows a x 2 -distribution with 4/i Q + 2 
degrees of freedom. As a result, a concentration inequality (see 17 Section 4]) yields 

P(||e|| 2 > (1 + 7 ) <V4/io + 2) <e" 2 ^ 2 , 
for any positive 7. This gives the following corollary. 



Corollary 1.3 Fix 7 > 0. Under the stochastic noise model (1.8), taking S = (1 + 7) o~\J 4f\ a + 2 yields 

||jr h i*(a; BB t-aO|| il < C (1 + 7) + 2 SRF 2 a. (1.9) 

with probability at least 1 — e~ 2 ^ lal . 



1.4 Extensions 

Other high-resolution kernels. We work with the high-resolution Fejer kernel but our results would hold with 
just about any other symmetric kernel as long as the kernel obeys the properties (1.10) and (JlTTJ below as 
the proof only uses these simple estimates. The first reads 

/ \K hi (t)\ dt < Co, / \K' U {t)\dt < d X hi \ sup \K& (t)\ < C 2 A M 3 , (1.10) 

where Co, Ci and C2 are positive constants independent of Ahi- The second is that there exists a nonnegative 
and nonincreasing function / : [0, 1/2] — > R such that 

KiCt + Ahi)! <f(t), 0<t<l/2, 

and 

1/2 

/(t)dt<C 3 A M 2 . (1.11) 



This is to make sure that (2.6) holds. (For the Fejer kernel, we can take / to be quadratic in [0, 1/2 — Ahi] 



and constant in [1/2 — Ahi, 1/2]-) 

Higher dimensions. Our techniques can be applied to establish robustness guarantees for the recovery of 



point sources in higher dimensions. The only parts of the proof of Theorem 1.2 that do not generalize directly 
are Lemmas |2.4[ |2.5| and |2.7| However, the methods used to prove these lemmas can be extended without 
much difficulty to multiple dimensions as described in Section [O of the Appendix. 



Spectral line estimation. Swapping time and frequency, Theorem |1.2| can be immediately applied to the 
estimation of spectral lines in which we observe 



y(t) = Y, ajj 7 ™* + z(t), t = 0, 1, . . . ,n - 1, 
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where z is a noise term. Here, our work implies that a nonparametric method based on convex optimization 
is capable of approximating the spectrum of a multitone signal with arbitrary frequencies, as long as these 
frequencies are sufficiently far apart, and furthermore that the reconstruction is stable. In this setting, the 
smoothed error can be interpreted as the recovery error windowed at a certain spectral resolution. 



1.5 Related work 

Since at least the work of Prony [24] , parametric methods based on polynomial rooting have been a popular 
approach to the super-resolution of trains of spikes and, equivalently, of line spectra. These techniques are 
typically based on the eigendecomposition of a sample covariance matrix of the data [3| [26] . A statistical 
analysis of MUSIC [2 28 , a popular algorithm following this principle, can be found in |32| along with 



performance limits for any unbiased estimate based on a Cramer-Rao bound. More precise analysis has been 
carried out for models with a reduced number of parameters, yielding, for instance, a characterization of the 



trade-off between resolution and signal-to-noise ratio for the detection of two closely-spaced line spectra 30 
or light sources 12f29 . In general, parametric techniques require prior knowledge of the model order and rely 



heavily on the assumption that the noise is white or at least has known spectrum (see Chapter 4 of [35) ). An 
alternative approach that overcomes the latter drawback is to perform nonlinear least squares estimation of 



the model parameters 36 . Unfortunately, the resulting optimization problem has an extremely multimodal 
cost function, which makes it very sensitive to initialization [34] . Nonparametric methods based on convex 
programming do not require knowledge of the model order and are guaranteed to converge to a global 
optimum. However, previous theoretical work on the stability of this approach was limited to a discrete and 
finite-dimensional setting, where the support of the signal of interest is restricted to a finer uniform grid [4j- 
Other analyses of the super-resolution problem in the presence of noise also focus on signals supported on a 
grid [6 31 . 



The total-variation norm is the continuous analog of the l\ norm for finite dimensional vectors so that our 
recovery algorithm can be interpreted as finding the shortest linear combination — in an £\ sense — of elements 
taken from a continuous and infinite dictionary. However, except for [4j, previous stability results for sparse 
recovery in redundant dictionaries do not apply even if we discretize the dictionary; this is due to the high 
coherence between the elements. Moreover, working with a discrete dictionary can easily degrade the quality 
of the estimate [5] (see [33] for a related discussion concerning grid selection for spectral analysis). This 
observation has spurred the appearance of modified compressed-sensing techniques specifically tailored to 
the task of spectral estimation |7[|9[ [13| . Proving stability guarantees for these methods under conditions on 
the support or the dynamic range of the signal is an interesting research direction. 



2 Proof of Theorem 1.2 



It is useful to first introduce various objects we shall need in the course of the proof. We let T — {tj} be the 
support of x and define the disjoint subsets 

SLr(j)--={t ■■ |*-t 3 -|<0.16A}, 

S£ r := {t : \t- tj\ > 0.16A, Vt/ G T} ; 

here, A S {Ai , Am}, and j ranges from 1 to \T\. We write the union of the sets S^ eai (j) as 

cA ._ , ,|T| r,A / -\ 
near ' j — 1 near w/ 

and observe that the pair (S' n v car , S* al ) forms a partition of T. The value of the constant 0.16 is not important 
and chosen merely to simplify the argument. We denote the restriction of a measure /u, with finite total 
variation on a set S by PsM (note that in contrast wc denote the low-pass projection in the frequency domain 
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by Q\ )- This restriction is well defined for the above sets, as one can take the Lebesgue decomposition of 
fj, with respect to a positive cr-finite measure supported on any of them 27 . To keep some expressions in 
compact form, we set 



(t - tj) 2 \n\ (dt) 



(M) : = J SLar(j) (M) 

for any measure \i and A G {Ai , Ahi}- Finally, we reserve the symbol C to denote a numerical constant whose 
value may change at each occurrence. 

Set h = x — Xggf The error obeys 

HQio^Hij < \ \Q\oX - y\\ Li + \ \y - Q\ x cst \\ Li < 25, 

and has bounded total- variation norm since ||/i|| TV — IMItv + litest I Itv — ^ IMItv Our aml * s ^° bound 
the L\ norm of the smoothed error e :— * h, 



K hi {t~r)h (dr) 



dt. 



We begin with a lemma bounding the total-variation norm of h 'away' from T. 



Lemma 2.1 Under the conditions of Theorem 1.2. there exist positive constants C a and C& such that 



P„A lo (h) + Pa 1o (h) < C a 5, 

"fsr TV ^noar 



TV 



< C b SRF 5. 



This lemma is proved in Section 2.1 and relies on the existence of a low- frequency dual polynomial constructed 
in [I] to guarantee exact recovery in the noiseless setting. 

To develop a bound about ||e||j, 1 , we begin by applying the triangle inequality to obtain 



\e(t)\ 



K hi (t-r)h(dr) 



< 



K u (t-r)h(dr) 



f K hi (t-r)h (dr) 



(2.1) 



By a corollary of the Radon- Nykodim Theorem (see Theorem 6.12 in [27]), it is possible to perform the polar 

is a 



decomposition P„\ hi (h) (dr) = e 12 * 9 ^ 

'-'far 

positive measure. Then 



P s A hi (h) (dr) such that 6 (r) is a real function and P q x bi (h) 



K hi (t-r)h (dr) 



r)| P„> hi (h) (dr)dt 



P^m (fc) (dr) 



<C 



TV 



(2.2) 



where we have applied Fubini's theorem and (1.10 1 (note that the total- variation norm of P„A hi (h) 

1 ' ''far 

bounded by 2||x|| TV < oo). 
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In order to control the second term in the right-hand side of (2.1 1, we use a first-order approximation of the 
super-resolution kernel provided by the Taylor series expansion of ip (r) = A'hi (t — t) around t j : for any r 
such that |r — tj\ < 0.16Ahi, we have 

\K U (t - t) - K hi (t - tj) - A hi (t - t 3 ) (tj - r)| < sup \\K&u)\ (r - hf . 

«:|t-t 3 -«|<0.16Ahi Z 

Applying this together with the triangle inequality, and setting tj = without loss of generality, give 



A hi (t-r)h (dr) 



dt < 



A hi (i)ft(dr) 



dt 



Ki (t)rh (dr) 



dt+ 2 



sup \K^(u)\r 2 \h\(dT] 

SZSirti) |t-u|<0.16A w 



dt. (2.3) 



(To be clear, we do not lose generality by setting tj — since the analysis is invariant by translation; in 
particular by a translation placing tj at the origin. To keep things as simple as possible, we shall make a 
frequent use of this argument.) We then combine Fubini's theorem with (1.10) to obtain 

A hi (t) h (dr) 



dt< [ \K hi (t)\dt 


f h(dr) 


<c 


f h(dr) 


Jt 









and 



K' u (t)rh(dr) 



dt < 



| At; (t)\ dt 



Some simple calculations show that (1 1 . 10h and (jl . 1 1 h imply 



rft (dr) 



< 



Ci 
Am 



rft (dr) 



/ sup \K^(u)\dt< 

JT |t-tt|<0.16A hi 



(2.4) 



(2.5) 



(2.6) 



for a positive constant C4. This together with Fubini's theorem yield 



\K^(u)\T 2 \h\(dr) 



dt< [ \K^(t)\dt 
Jt 



r 2 \h\ (dr) 



(2.7) 



<C A SBF 2 I s ^ {j) (h). 

In order to make use of these bounds, it is necessary to control the local action of the measure ft, on a constant 
and a linear function. The following two lemmas are proved in Sections |2.2| and |2.3| 



Lemma 2.2 Take T as in Thee 



E 

Lemma 2.3 Take T as in The 



1.2 and any measure ft obeying \\Q\ h\\ L < 25. Then 
h (dr) < 2(5 



P , hi (ft) +CI * u (ft) 

a fat TV ^near 



1.2 and any measure h obeying \\Q\ h\\ L < 25. Then 



E 



{T-tj)h(dT) 



< C (Alo 5 + Aio J \P s x lo {h) ||+ Aio I s A,„ r (h) + A hi SRF 2 (ft)) . 



We may now conclude the proof of our main theorem. Indeed, the inequalities (2.2), (2.3), (2.4), (2.5) and 
(2.7) together with I s >> hi (ft) < I s x lo (ft) imply 



|e|| £i <C SRFS 



P**u 0) 



SRF 



TV 



TV 



SRF 2 7 s a 1o (ft)) < C SRF 2 <5 



where the second inequality follows from Lemma 2.1 
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2.1 Proof of Lemma 12.11 



The proof relies on the existence of a certain low-frequency polynomial, and we first recall Proposition 2.1 
and Lemma 2.5 from [4j. 



Lemma 2.4 Suppose T obeys the separation condition (1.7) and take any v g C' T ' with \vj\ = 1. Then 
there exists a low-frequency trigonometric polynomial 



ho 

q(t) = ^2 °k e ' 

fc=-/lo 



obeying the following properties: 



q(tj) = vj, tj G T, 

\q(t)\<l- Cait - tj) \ t€S& 



\2 

A lo 



|«(t)|<i-C 6 , teS, 



tar ' 



(2.8) 
(2.9) 
(2.10) 



with 0<C b < 0.16 2 C a < 1. 



Invoking a corollary of the Radon-Nykodim Theorem (see Theorem 6.12 in |27| ), it is possible to perform a 
polar decomposition of Prh, 

P T h = e^ (t) \P T h\ , 



such that <fi (t) is a real function defined on T. To prove Lemma 
low frequency, 



2.1 



we work with Vj = e l< ^w). Si 



q(t)dh(t) 



q(t)Qi Q h(t)dt 



< 



\q\\ Lae \\Qioh\\ Ll < 28. 



mce q is 



(2.11) 



Next, since q interpolates e l< ^ t ' on T, 



I*V*IItv= / q(t)P T h(dt) < 

Jt 



q(t)h(dt) 



j 



q{t)h (dt) 



< 25 + E 

J'6T 



q(t)h(dt) 



q(t)h(dt) 



•'far 



Applying (2.10) in Lemma 2.4 and Holder's inequality, we obtain 



q(t)h(dt) 



< P Ho q P ' Alo (ft) 

''far Loo f" TV 



<(1-C 6 ) P q x lo (h) 



TV 



Set t,- = without loss of generality. The triangle inequality and (2.9) in Lemma 2.4 yield 



S£L0')\{0} 



q(t)h(dt) 



S&SrO')\{°} 

C^ 2 



I Sillr(j)\{0} 



Iftl (dt) 



< 



s^,U)\{o} 



\h\ (dt) - C / s M |r( . } (h) 



(2.12) 



(2.13) 



(2.14) 
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Combining (2.12|, (2.13) and (2.14) gives 



P T h\\ TV <25+\\P T ch\\ TV -C b P q , la (h) 



TV 



- C a I s ^ (h) 



Observe that we can substitute Ai Q with in (2.12) and (2.14) and obtain 



\P T h\\ TV <2S+\\P T ch\\ TY -0.l6 2 C a SRF- 2 P ' * w (h) 



TV 



This follows from using (2.9) instead of (2.10) to bound the magnitude of q on Sfjt 



These inequalities can be interpreted as a generalization of the strong null-space property used to obtain 
stability guarantees for super-resolution on a discrete grid (see Lemma 3.1 in [4])- Combined with the fact 
that x has minimal total-variation norm among all feasible points, they yield 



A\tv ^ ll x + ^11 



TV 



> X 



> X 



TV 



TV 



\\P T h\\ 



TV 



\Pi*h\\ 



TV 



25 + C b 



P q x lo (h) + CJ q ^ (h) . 



TV 



As a result, we conclude that 



TV 



and by the same argument, 



This finishes the proof. 



0.16 2 C a SRF" 



P„a m (h) + C a I^ hi (h) < 25. 

b !*r TV b " B ^ 



2.2 Proof of Lemma 

The proof also relies upon the low- frequency polynomial from Lemma [2. 4| and the fact that q(t) is close to 
Vj when t is near tj . The intermediate result is proved in Section ^ of the Appendix. 

Lemma 2.5 There is a polynomial q satisfying the properties from Lemma\2.J\ and, additionally, 



\q(t)-vi\ < 



for all t G (j) 



Consider the polar form 



h(dr) 



h(dr) 



where 9j G [0, 2ir). We set Vj = e j in Lemma 2.4 and apply the triangular inequality to obtain 



h(dr) 



e-^Ti(dr) 
< f x q(r)h(dr) + 



(q(r)-e-^)h(dr) 



(2.15) 
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for all tj € T. By another application of the triangle inequality and (2.11 ) 



q (t) h (dr) < 


/ q(r)h(dr) 


+ 


f q(r)h(dr) 











< 25 



TV 



(2.16) 



To bound the remaining term in (2.15), we combine (2.9) in Lemma 2.4 and Holder's inequality. With tj = 
(this is no loss of generality) , 



h(dt) 



: I | ? (t)-e-**||fc|(di) 
Ct 2 



< 



, \h\(dt) = ci s ^ irU) (h) 



It follows from this, (2.15) and (2.161 that 



h{dr) 



<26+ P s x bi (h) mi + CI S ^ (h) 



TV 



The proof is complete. 



2.3 Proof of Lemma [231 

We record a simple lemma. 

Lemma 2.6 For any measure fi and with tj = 0. 

/■0.16Ai o 



T[i (dr) 



<6.25A hi SRI%x |grW (p). 



/0.16A hi 

Proof Note that in the interval [0.16Ahi, 0.16Ai o ], t/0.16Ahi > 1, whence 



/•0.16Ai o 




/ T[i (dr) 


< / r|/x|(dr)< 


J0.16A hi 


•/o.i6A hi 



0.16Ai o T 2 



0.16A hi 



0.16A 



hi 



H (dr) < 



A 



lo 



I,, 



0.16 A hi SS&O) 



We now turn our attention to the proof of Lemma 2.3 By the triangle inequality, 

{T-tj)h(dT) < 



E 



sS(i) 



E 



S„,ig r (i) 



(r-^O^Cdr) 



E 



0.16A hi <|T-tj|<0.16A lo 



{T-tj)h{dT) 



(2.17) 



The second term is bounded via Lemma |2.6| For the first, we use an argument very similar to the proof of 
Lemma |2.2| Here, we exploit the existence of a low- frequency polynomial that is almost linear in the vicinity 
of the elements of T. The result below is proved in Section IB] of the Appendix. 



11 



Lemma 2.7 Suppose T obeys the separation condition (1.7) and take any v £ C' T ' with \vj\ = 1. TTie 
there exists a low-frequency trigonometric polynomial 



qi(t) = ^2 r '' ( 

k=-fu 



i2i:kt 



obeying 



\qi(t)-Vj{t-tj)\ < 



C a (t - tiY 



A 



lo 



t € <^near (j) ) 



\qi(t)\ < CbXto, teS, 



far ' 



(2.18) 
(2.19) 



for positive constants C a , C . 
Consider the polar decomposition of 



(r-t J )fe(dr) = 



(T-tj)h(dT) 



where 9j 6 [0,27r), tj € T, and set Vj — e j in Lemma 2.7 Again, suppose ij = 0. Then 
/ Th(dr) = [ e- ie Wh(&r) 



< 



( qi (T)-e~^T)h(dT) 



9i (r)ft(dr) 



(2.20) 



The inequality (2.18) and Holder's inequality allow to bound the hrst term in the right-hand side of (|2.20|), 
/ ( qi (r)-e-^r)h(dr) 



< / x |«i(r)-e-^r||ft|(dr) 

C a 



< 



\h\ (dr) 



< C a Alo Is^ltrU) ^ 



Another application of the triangular inequality yields 



9i (r) h (dr) < 



91 (r)h(dr) 



9i (r)Zi(dr) 



(2.21) 



(2.22) 



We employ Holder's inequality, (2.11), (2.18) and (2.19) to bound each of the terms in the right-hand side. 
First, 



/ 9i (r) h (dr) 

JT 



< 



IteilLJIMk <c\ 1o s. 



Second, 



«i(r)/i(dr)< ^io(gi) f s A l0 (/i) 

g^lo °far I„ °far 



TV 



< C& Aio 



TV 



(2.23) 
(2.24) 



Combining (|2.17|) with these estimates gives 

(r -tj)h (dr) < C (A lo S + Aio | |P s » to (fc) 1 1 + Ai D (/i) + A hi SRF 2 7^ (fe)) , 



E 

as desired 



CO') 
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3 Discussion 



We have shown that we could extrapolate the spectrum of a superposition of point sources by convex 
programming and that the extrapolation error scales quadratically with the super-resolution factor. This is 
a worst case analysis since the noise has bounded norm but is otherwise arbitrary. Natural extensions would 
include stability studies using other error metrics and noise models. For instance, an analysis tailored to a 
stochastic model might be able to sharpen Corollary |1.3| and be more precise in its findings. In a different 
direction, our techniques may be directly applicable to related problems. An example concerns the use of the 
total- variation norm for denoising line spectra [lj. Here, it would be interesting to see whether our methods 
allow to prove better denoising performance under a minimum-separation condition. Another example 



concerns the recovery of sparse signals from a random subset of their low-pass Fourier coefficients 37 . Here, 
it is likely that our work would yield stability guarantees from noisy low-frequency data. 



On the algorithmic side, suppose we use the L 2 norm to constrain the feasible set, 

min ||x|| TV subject to \\Q\ Q x — u\\l 2 < 5. 

X 

Then the dual problem takes the form 



(3.1) 



max Re \{F\ y)* u] 



5 \\u\ 



subject to ||-Fio u IIl < 1> 



where n = 2f\ Q + 1 and F\ denotes the linear operator that maps a function to its first n := 2f\ + 1 Fourier 



coefficients as in (1.8) so that Q\ Q = F* Fi . The dual can be recast as the semidefinite program (SDP) 



max Re [{Fi Q y)* 



subject to 



u 
n-j 



ho, 



+3 



i=l 




, n — 1, 



(3.2) 



where Q is an n x n Hermitian matrix, leveraging a corollary to Theorem 4.24 in [§] (see also [T]|4j[37]). In 
most cases, this allows to solve the primal problem with high accuracy. 



Lemma 3.1 Let (x es t, tt e st) be a primal-dual pair of solutions to (3.1) -(3.2) . For any t G T with x es t (t) =/= 0, 



(F* a Uest) (t) = sign (x cst (t)) . 



Proof First, we can assume that y is low pass in the sense that Qi y — y. Since a; ost is feasible, ||-Fio(2/ — 
x e st)|U 2 = 1 1 2^ — QioXcst\\L 2 < <5- Second, strong duality holds here. Hence, the Cauchy-Schwarz inequality 
gives 



Xcst 



TV 



Re[(F lo yY 



■<5||u cs t|| 2 = (F\ x cst ,u cst ) + (F lo y - F lo x ost , u ost ) - 6 ||wcst|| 2 < (x ost , i 7 !* u ost ) 



By Holder's inequality and the constraint on F* a u cs t, ||x GS t|lTv — (^cst, F\ u cst ) so that equality holds. This 
is only possible if F^ Q u est equals the sign of x est at every point where x est is nonzero. ■ 

This result implies that it is usually possible to determine the support of the primal solution by locating those 
points where the polynomial q(t) — (F* u es t)(t) has modulus equal to one. Once the support is estimated 
accurately, a solution to the primal problem can be found by solving a discrete problem. Figure [3] shows the 
result of applying this scheme to a simple example. We omit further details and defer the analysis of this 
approach to future work. 
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-0.3 



Figure 3: (a) Original support of a signal obeying the minimum-separation condition (black) 



along with its low-pass projection before (blue) and after adding noise (red). The low-pass 
projection is obtained by truncating the spectrum of the signal to its first 101 Fourier coeffi- 
cients. The noise added to the noiseless Fourier coefficients is i.i.d. Gaussian with amplitude 
giving a signal-to-noise ratio of 31.5 dB. |(b)| Original signal (blue) and estimate obtained by 
solving the SDP dO) (red). 
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A Proof of Lemma 



2.5 



We use the construction described in Section 2 of pj. In more detail, 

q(t) = a kG(t - t k ) + p k G {1 \t - i fc ), 



t k £T 



where a,/3 € C' T ' are coefficient vectors 

G(t) = 



1 7T< 



+ sin (Trf) 



teT\{0}, 



(A.l) 



and G(0) = 1; here, is the £th derivative of G. If f io is even, G(i) is the square of the Fejer kernel. By 
construction, the coefficients a and f3 are selected such that for all tj E T, 

q(tj) = Vj 

q'{tj) = o. 

Without loss of generality we consider tj = and bound q (t) — Vj in the interval [0, 0.16Ai o ]. To ease notation, 
we define w(t) = q(t) — Vj = Wn(t) + iwj(t), where wr is the real part of w and W] the imaginary part. 
Leveraging different results from Section 2 in [I] (in particular equations (2.23) and (2.25) and Lemmas 2.2 
and 2.7), we have 



E Re K) G^ (t - t k ) + £ ^ (&) G^ (t - t k ) 



t k £T 



t fc eT 



<Ni £o „El G(a) (*-**)! 



E \G {3) (t-t k ) 



t k £T 



<c a 



<cfl 



G«(t) + E | G(2) (*-^) +C^A lo G< 3 »(i) + E \G (3) (t-t k ) 



t k er\{o} 



t k £T\{0} 



The same bound holds for wi. Since wr(0), w' r (0), u>/(0) and w'j(0) are all equal to zero, this implies 
I^rMI < C'/io^ 2 an< i I^WI — G' fi Q t 2 in the interval of interest, which allows the conclusion 

\w(t)\<Cf? Q t 2 . 



B Proof of Lemma 12.71 



The proof is similar to that of Lemma 2.4 (see Section 2 of [4j), where a low-frequency kernel and its derivative 
are used to interpolate an arbitrary sign pattern on a support satisfying the minimum-distance condition. 
More precisely, we set 

gi(t)= E akG(t-t k )+p k G {1) {t-t k ), (B.l) 



t k <£T 



where a, & C' T ' are coefficient vectors, G is defined by (A.l). Note that G, G^ and, consequently, qx are 
trigonometric polynomials of degree at most fo- By Lemma 2.7 in jij, it holds that for any to & T and t € T 
obeying |i - t Q | < 0.16Ai o , 



E \G le Ht-t k ) 

t k eT\{t } 



<Ctft 



lo > 



(B.2) 
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where C& is a positive constant for t = 0,1,2,3; in particular, Co < 0.007, C\ < 0.08 and C 2 < 1.06. In 
addition, there exist other positive constants C' and C[, such that for all to £ T and t eT with \t — to\ < A/2, 



£ \G (l) {t-t k ) <C'JL 

t k £T\{t } 

for I = 0, 1. We refer to Section 2.3 in [4] for a detailed description of how to compute these bounds. 



(B.3) 



In order to satisfy (2.181 and (2.19), we constrain qi as follows: for each tj £ T, 



0, 



Intuitively, this forces q\ to approximate the linear function Vj (t — tj ) around tj . These constraints can be 
expressed in matrix form, 



'D 


D{ 




a 




"0" 


Pi 


D 2 




A 




1; 



where 

(Do) jk = G {tj - t k ) , (Di) jk = G (1) (tj - t k ) , (D 2 ) jk = (tj - t k ) , 

and j and k range from 1 to \T\. It is shown in Section 2.3.1 of 14] that under the minimum-separation 
condition this system is invertible, so that a and f3 are well defined. These coefficient vectors can consequently 
be expressed as 



a 




A 





S^v, S := D 2 - DxD^Dx, 



where S is the Schur complement. Inequality (B.2) implies 

l|/-A>IL<c , 

hail <ci/io, 

\\nI-D 2 \\ 00 <C 2 fl, 

where n = \G^(0)\ = 7r 2 /io(/io + 4)/3. 



(B.4) 
(B.5) 
(B.6) 



Let \\M Hoc denote the usual infinity norm of a matrix M defined as ||M|L = maxiuii = x H-Ma^L = 

maXiY,j \<Hj\- Then, if ||/-M|L < 1, the series M" 1 = (I- (I - M))~ l = Y,k>o i 1 ~ M t is conver- 
gent and we have 

llM" 1 !! 



1 



This, together with (B.4 1, (B.5) and (B.6) implies 

1 



lAMI . < 



1 



1- \ I-Do 



\kI-S\1 < \\kI-D 2 



< 



1-Co' 



IA1L < C 



< 



k -\\kI-S\\ c 



< u - c 



1 -c 
c\ 



hoi 



I -Co 



flo < C K \ lo , 



for a certain positive constant C K . Note that due to the numeric upper bounds on the constants in (B.2) C K 
is indeed a positive constant as long as /i Q > 1. Finally, we obtain a bound on the magnitude of the entries 
of a 



a 



IVAS-HL < ll^o 1 ^- 1 !! < IIA'IL HAIL II^IL < c a \ lo , (b.7) 
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where C a — C K C\j (1 — Co), and on the entries of j3 



(B. 



for a positive constant Cp = C K . Combining these inequalities with (B.3) and the fact that the absolute 
values of G(t) and G^'(t) are bounded by one and 7f\ a respectively (see the proof of Lemma C.5 in |4|), we 
have that for any t 



M*)l = 



E <*kG (t - * fe ) + E /3feG(1) (* - ife ) 



t fe GT 



<IML E |G(*-t fc )l + ll/9|L E l G(1) (*-**) 



<C Q A lo |G(t)|+ E \G(t-t k )\]+C Xl\ G^(t) + E 



* fc eT\{ti} 



t fc 6T\{*,} 



< c\ lo , 



(B.9) 



where tj denotes the element in T nearest to t (note that all other elements are at least A/2 away). Thus, 
(12.191) holds. 



The proof is completed by the following lemma, which proves (2.18). 



Lemma B.l For any tj € T and f e T obeying \t — tj\ < 0.16Ai o , we /icroe 



l«i(*)-«j(*-*i)l< 



A 1,-, 



Proof We assume without loss of generality that = 0. By symmetry, it suffices to show the claim for 
t € (0, 0.16Ai o ]. To ease notation, we define w(t) = Vjt — qi(t) = Wn(t) +iwj(t), where wr is the real part 
of w and wi the imaginary part. Leveraging (B.7), (B.8) and (B.2) together with the fact that G^ 2 \t) and 
G^ 3 \t) are bounded by 4f 2 a and 6/^ respectively if \t\ < 0.16Ai o (see the proof of Lemma 2.3 in 4 1), we 
obtain 



E Re (a fe ) G^ (t - t fc ) + E R e (A) G (3) (t - t k ) 



tfcGT 



<IMLE \ G{2) (t-t k ) +II/3ILE |g (3) (<-^ 



tfcST 



< C a Ai 



<C/lo- 



t fc £T\{0} 



G( 3 )(t) + J] |G (3) (t-4-) 
t k eT\{0} 



The same bound applies to wj. Since Wr(0), w' R (0), wj(0) and w'j(fi) are all equal to zero, this implies 
|w_r(£)| < Cf\ t 2 — and similarly for — in the interval of interest. Whence, \w(t)\ < Cf\ Q t 2 . ■ 



C Extension to multiple dimensions 



Lemmas |2 ,4| (together with Lemma 2.5 ) and 2.7 construct bounded low- frequency polynomials which interpo- 
late a sign pattern on a well-separated set of points S and have bounded second derivatives in a neighborhood 
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of S. In order to extend our results all we need is to prove their multidimensional analogs (in this case, 
instead of bounding the second derivative, we must bound the eigenvalues of the Hessian matrix). One can 
proceed in a way similar to the proof of Lemmas |2.4| and |2.7[ namely, by using a low- frequency kernel con- 
structed by tensorizing several squared Fejer kernels to interpolate the sign pattern, while constraining the 
first-order derivatives to either vanish or have a fixed value. To do this, we can set up a system of equations 
and prove that it is well conditioned using the rapid decay of the interpolation kernel away from the origin. 
Finally, one can verify that the construction satisfies the required conditions by exploiting the fact that the 
interpolation kernel and its derivatives are locally quadratic and rapidly decaying. This is spelled out in the 
proof of Proposition C.l in 4| to prove a version of Lemma 2.4 in two dimensions. 
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